LCORE-1446: Add support for metadata filters in Solr vector search by anik120 · Pull Request #1581 · lightspeed-core/lightspeed-stack

anik120 · 2026-04-23T15:30:54Z

Description

Updates the Solr vector search integration to support the new Filter API introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG results using comparison and compound filters.

Filter format examples:
Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
Compound: {"filters": {"type": "and", "filters": [...]}}

Note: This change requires lightspeed-providers with solr_vector_io filter support, introduced in lightspeed-core/lightspeed-providers#119

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude

Related Tickets & Documents

LCORE-1446: Add dynamic filter support to Solr vector IO provider lightspeed-providers#119

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Structured metadata filtering now supports explicit comparison operators (eq, ne, in, nin) and nested filter expressions; legacy filter format remains supported.
Documentation
- API examples and descriptions updated to show structured filter payloads and usage patterns.
Tests
- Added tests verifying structured filter extraction, preservation of nested filter lists, and backward compatibility with legacy filters.

coderabbitai · 2026-04-23T15:31:08Z

Walkthrough

Adds structured Solr metadata filter support and documentation, updates query-parameter normalization to extract nested filters into a top-level params["filters"], and extends unit tests to cover legacy, structured, and mixed filter inputs.

Changes

Cohort / File(s)	Summary
Schema & Documentation `src/models/requests.py`	Updated descriptions and examples for `QueryRequest.solr` and `ResponsesRequest.solr` to document structured metadata filters with comparison operators (`eq`, `ne`, `in`, `nin`) while retaining legacy `fq`-style examples. No type/signature changes.
Filter Normalization Logic `src/utils/vector_search.py`	Modified `_build_query_params` to detect when `solr.filters` is a dict containing a `"filters"` key: extract that value to `params["filters"]` and forward remaining keys to `params["solr"]`; otherwise keep legacy behavior and place `solr.filters` under `params["solr"]`. Docstring and debug logging updated.
Test Coverage `tests/unit/utils/test_vector_search.py`	Replaced legacy-only test with multiple tests validating: legacy `filters.fq` stays under `params["solr"]`; structured `filters.filters` is extracted to `params["filters"]`; mixed structured+solr params split correctly; compound `and` filters preserved. Minor argument formatting tweaks in other tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and accurately summarizes the main objective of the changeset: adding support for metadata filters in Solr vector search, which is reflected across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/models/requests.py (1)

127-150: ⚠️ Potential issue | 🟡 Minor

Update the canonical filters description to match the new routing.

filters is no longer always passed through as params["solr"]; structured metadata filters are extracted to top-level params["filters"] in src/utils/vector_search.py. Keeping this doc/field description stale conflicts with the new examples.

Proposed doc update

 class SolrVectorSearchRequest(BaseModel):
     """LCORE Solr inline RAG options for ``vector_io.query`` (mode and provider filters).
 
     Attributes:
         mode: Solr vector_io search mode. When omitted, the server default (hybrid) is used.
-        filters: Solr provider filter payload passed through as params['solr'].
+        filters: Solr provider filter payload. Structured metadata filters under
+            a nested ``filters`` key are sent to top-level ``params["filters"]``;
+            remaining legacy Solr options are passed under ``params["solr"]``.
@@
     filters: Optional[dict[str, Any]] = Field(
         None,
-        description="Solr provider filter payload passed through as params['solr'].",
+        description=(
+            "Solr provider filter payload. Structured metadata filters under "
+            "a nested 'filters' key are sent as top-level vector_io filters; "
+            "remaining legacy Solr options are passed under params['solr']."
+        ),
         examples=[{"fq": ["product:*openshift*", "product_version:*4.16*"]}],
     )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/models/requests.py` around lines 127 - 150, Update the Field description
for the filters attribute in the LCORE Solr inline RAG options (the filters
field defined in this model used by vector_io.query) to reflect the new routing:
explain that structured metadata filters are extracted to top-level
params["filters"] while remaining Solr-specific payloads are kept under
params["solr"], and adjust the example text accordingly so it no longer states
that filters are always passed through as params["solr"]; reference the
extraction logic in src/utils/vector_search.py when rewording to make the
distinction clear.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/models/requests.py`:
- Around line 290-312: Reformat the unformatted nested dict/list literals in the
Solr examples block (the entries containing "mode": "hybrid", "mode":
"semantic", and the {"filters": {"fq": ["product:*openshift*"]}} entry) to
comply with Black style; run Black (or your project's pre-commit formatting) on
src/models/requests.py so the nested dictionaries and lists are reformatted
consistently (wrap long lines, align brackets and commas) and commit the
reflowed code.

In `@src/utils/vector_search.py`:
- Line 90: Reformat the long dict comprehension into Black-friendly multi-line
form: break the expression that assigns remaining_filters from
solr.filters.items() across lines so Black won't reflow it; locate the
assignment to remaining_filters (the comprehension over solr.filters.items())
and split the dict literal, the for clause, and the if clause onto separate
lines consistent with Black's preferred formatting.
- Around line 61-68: Update the docstring for the function that builds the
parameter dictionary for vector_io.query (the code handling the solr argument)
to clarify that only structured filters (the structured Solr request: mode and
structured filters) are lifted to the top level, while legacy/unnested filter
payloads remain nested under params["solr"]; explicitly mention the solr
argument and that legacy filters are preserved under params["solr"] so callers
know which filters are moved vs retained.

In `@tests/unit/utils/test_vector_search.py`:
- Around line 83-136: Reformat the inline
SolrVectorSearchRequest.model_validate(...) test fixtures in
tests/unit/utils/test_vector_search.py to Black-compliant multi-line style with
trailing commas (for each dict passed to
SolrVectorSearchRequest.model_validate), then re-run Black/flake formatting;
focus on the fixtures in test_with_filters_and_other_solr_params,
test_with_compound_filter, and the earlier test block that calls
_build_query_params(solr=solr) so the dict literals are split across lines,
keys/values have trailing commas, and the file passes Black. Use the same symbol
references (SolrVectorSearchRequest.model_validate and _build_query_params) to
locate and update the offending calls.

---

Outside diff comments:
In `@src/models/requests.py`:
- Around line 127-150: Update the Field description for the filters attribute in
the LCORE Solr inline RAG options (the filters field defined in this model used
by vector_io.query) to reflect the new routing: explain that structured metadata
filters are extracted to top-level params["filters"] while remaining
Solr-specific payloads are kept under params["solr"], and adjust the example
text accordingly so it no longer states that filters are always passed through
as params["solr"]; reference the extraction logic in src/utils/vector_search.py
when rewording to make the distinction clear.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7227812e-62f3-4284-997b-ba1189f40da7

📥 Commits

Reviewing files that changed from the base of the PR and between e59b4b9 and b6f480c.

📒 Files selected for processing (3)

src/models/requests.py
src/utils/vector_search.py
tests/unit/utils/test_vector_search.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: radon
GitHub Check: Pylinter
GitHub Check: unit_tests (3.12)
GitHub Check: unit_tests (3.13)
GitHub Check: integration_tests (3.13)
GitHub Check: integration_tests (3.12)
GitHub Check: build-pr
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

src/utils/vector_search.py
src/models/requests.py
tests/unit/utils/test_vector_search.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

src/utils/vector_search.py
src/models/requests.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

tests/unit/utils/test_vector_search.py

🧠 Learnings (4)

📚 Learning: 2026-03-17T11:34:53.242Z

Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

src/utils/vector_search.py
src/models/requests.py
tests/unit/utils/test_vector_search.py

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/requests.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/requests.py

📚 Learning: 2026-04-15T18:54:07.540Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:07.540Z
Learning: In lightspeed-core/lightspeed-stack request models, keep schema-level size limits defined as inline numeric literals (e.g., Pydantic Field max_length values used for validation) rather than extracting them into constants.py. Only extract values intended as configurable runtime defaults (e.g., DEFAULT_* settings like header/file upload sizes). Do not flag inline numeric literals used directly in Pydantic Field constraints or field validators as an extraction-to-constants issue.

Applied to files:

src/models/requests.py

🪛 GitHub Actions: Black

src/utils/vector_search.py

[error] 1-1: Black --check would reformat this file.

src/models/requests.py

[error] 1-1: Black --check would reformat this file.

tests/unit/utils/test_vector_search.py

[error] 1-1: Black --check would reformat this file.

🔇 Additional comments (1)

src/models/requests.py (1)

777-779: ResponsesRequest doc update looks aligned.

The docstring now advertises structured and legacy Solr filter support for the Responses API request model.

coderabbitai · 2026-04-23T15:35:25Z

+            logger.debug("Extracted filters from solr.filters: %s", params["filters"])
+
+            # Pass remaining solr.filters content (legacy fq, etc.) to params["solr"]
+            remaining_filters = {k: v for k, v in solr.filters.items() if k != "filters"}


⚠️ Potential issue | 🟡 Minor

Format the comprehension with Black.

CI is already failing Black; this long single-line dict comprehension is one of the new spots that will be rewritten.

Black-style formatting

- remaining_filters = {k: v for k, v in solr.filters.items() if k != "filters"} + remaining_filters = { + k: v for k, v in solr.filters.items() if k != "filters" + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` at line 90, Reformat the long dict comprehension into Black-friendly multi-line form: break the expression that assigns remaining_filters from solr.filters.items() across lines so Black won't reflow it; locate the assignment to remaining_filters (the comprehension over solr.filters.items()) and split the dict literal, the for clause, and the if clause onto separate lines consistent with Black's preferred formatting.

Updates the Solr vector search integration to support the new Filter API introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG results using comparison and compound filters. Filter format examples: Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}} Compound: {"filters": {"type": "and", "filters": [...]}} Note: This change requires lightspeed-providers with solr_vector_io filter support, introduced in lightspeed-core/lightspeed-providers#119

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

src/models/requests.py (1)

290-312: ⚠️ Potential issue | 🟡 Minor

Run Black on the Solr examples.

CI still reports src/models/requests.py would be reformatted; this example block has missing trailing commas and a long inline dict/list literal.

Black-style formatting

             {
                 "mode": "hybrid",
                 "filters": {
                     "filters": {
                         "type": "eq",
                         "key": "platform",
-                        "value": "openshift"
+                        "value": "openshift",
                     }
-                }
+                },
             },
             {
                 "mode": "semantic",
                 "filters": {
                     "filters": {
                         "type": "and",
                         "filters": [
                             {"type": "eq", "key": "platform", "value": "openshift"},
-                            {"type": "in", "key": "version", "value": ["4.14", "4.15", "4.16"]}
-                        ]
+                            {
+                                "type": "in",
+                                "key": "version",
+                                "value": ["4.14", "4.15", "4.16"],
+                            },
+                        ],
                     }
-                }
+                },
             },

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/models/requests.py` around lines 290 - 312, The example Solr query
literals in src/models/requests.py (the inline dict/list block containing the
hybrid/semantic/filter entries) are not Black-formatted — add the missing
trailing commas and reformat the long inline dict/list across multiple lines to
satisfy Black; specifically update the three example entries (the dict with
"mode": "hybrid", the dict with "mode": "semantic" and nested "filters" list,
and the final {"filters": {"fq": ["product:*openshift*"]}} entry) so each nested
dict/list element has proper trailing commas and line breaks, then run Black (or
apply Black formatting) to the file.

src/utils/vector_search.py (2)

61-68: ⚠️ Potential issue | 🟡 Minor

Clarify that only structured filters are extracted.

Legacy Solr filters remain under params["solr"], so the return docs should not imply every filter moves to top-level params["filters"].

Docstring update

     Args:
         solr: Optional structured Solr request (mode and filters from the API).
             - mode: Solr search mode (semantic, hybrid, lexical)
-            - filters: Solr filter payload, may contain structured metadata filters
+            - filters: Solr filter payload; nested structured metadata filters are
+              extracted, while legacy Solr filters remain under ``params["solr"]``
 
     Returns:
-        Parameter dictionary for ``vector_io.query`` with extracted filters at top level.
+        Parameter dictionary for ``vector_io.query``. Structured metadata filters are
+        placed at top level; legacy Solr filters remain under ``params["solr"]``.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 61 - 68, Update the docstring for
the function that builds parameters for vector_io.query to make clear that only
structured Solr filters (the `solr["filters"]` structured payload) are extracted
into top-level `params["filters"]`, while legacy/other Solr filters remain
nested under `params["solr"]`; explicitly mention the `solr` argument,
`params["filters"]`, and `params["solr"]` to avoid implying all filters are
lifted to top level.

90-90: ⚠️ Potential issue | 🟡 Minor

Format the dict comprehension with Black.

CI reports this file would be reformatted; split the long comprehension before merging.

Black-style formatting

-            remaining_filters = {k: v for k, v in solr.filters.items() if k != "filters"}
+            remaining_filters = {
+                k: v for k, v in solr.filters.items() if k != "filters"
+            }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` at line 90, The dict comprehension assigning
remaining_filters is too long for Black; reformat it into a multi-line
comprehension (wrap the comprehension in parentheses and place the for/if parts
on separate indented lines) so Black won't reformat it in CI—e.g., break the
expression that builds remaining_filters from solr.filters across multiple lines
and align the key/value pair, for-loop, and conditional onto their own lines
while preserving the same logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/utils/test_vector_search.py`:
- Around line 109-151: Update the two tests
test_with_filters_and_other_solr_params and test_with_compound_filter to assert
the full structured filter payload returned by _build_query_params (not just
type/key/length); for the first test, verify params["filters"] equals the full
dict {"type":"in","key":"version","value":["4.14","4.15"]} and that
params["solr"] equals {"custom_param":"value"} and k is default, and for the
second test assert params["filters"] equals the full nested compound dict (type
"and" with its "filters" list containing the exact eq and ne dicts) and that
"solr" is absent; use the SolrVectorSearchRequest-created expected structures to
compare deep equality.

---

Duplicate comments:
In `@src/models/requests.py`:
- Around line 290-312: The example Solr query literals in src/models/requests.py
(the inline dict/list block containing the hybrid/semantic/filter entries) are
not Black-formatted — add the missing trailing commas and reformat the long
inline dict/list across multiple lines to satisfy Black; specifically update the
three example entries (the dict with "mode": "hybrid", the dict with "mode":
"semantic" and nested "filters" list, and the final {"filters": {"fq":
["product:*openshift*"]}} entry) so each nested dict/list element has proper
trailing commas and line breaks, then run Black (or apply Black formatting) to
the file.

In `@src/utils/vector_search.py`:
- Around line 61-68: Update the docstring for the function that builds
parameters for vector_io.query to make clear that only structured Solr filters
(the `solr["filters"]` structured payload) are extracted into top-level
`params["filters"]`, while legacy/other Solr filters remain nested under
`params["solr"]`; explicitly mention the `solr` argument, `params["filters"]`,
and `params["solr"]` to avoid implying all filters are lifted to top level.
- Line 90: The dict comprehension assigning remaining_filters is too long for
Black; reformat it into a multi-line comprehension (wrap the comprehension in
parentheses and place the for/if parts on separate indented lines) so Black
won't reformat it in CI—e.g., break the expression that builds remaining_filters
from solr.filters across multiple lines and align the key/value pair, for-loop,
and conditional onto their own lines while preserving the same logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f394fddb-4d56-45f0-a386-7dc57aa59c34

📥 Commits

Reviewing files that changed from the base of the PR and between b6f480c and 3b5b567.

📒 Files selected for processing (3)

src/models/requests.py
src/utils/vector_search.py
tests/unit/utils/test_vector_search.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: radon
GitHub Check: mypy
GitHub Check: check_dependencies
GitHub Check: build-pr
GitHub Check: Pylinter
GitHub Check: integration_tests (3.12)
GitHub Check: unit_tests (3.13)
GitHub Check: unit_tests (3.12)
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

src/utils/vector_search.py
tests/unit/utils/test_vector_search.py
src/models/requests.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

src/utils/vector_search.py
src/models/requests.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

tests/unit/utils/test_vector_search.py

🧠 Learnings (5)

📚 Learning: 2026-03-17T11:34:53.242Z

Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

src/utils/vector_search.py
tests/unit/utils/test_vector_search.py
src/models/requests.py

📚 Learning: 2026-02-23T14:11:46.950Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.

Applied to files:

src/models/requests.py

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/requests.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/requests.py

📚 Learning: 2026-04-15T18:54:07.540Z

Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:07.540Z
Learning: In lightspeed-core/lightspeed-stack request models, keep schema-level size limits defined as inline numeric literals (e.g., Pydantic Field max_length values used for validation) rather than extracting them into constants.py. Only extract values intended as configurable runtime defaults (e.g., DEFAULT_* settings like header/file upload sizes). Do not flag inline numeric literals used directly in Pydantic Field constraints or field validators as an extraction-to-constants issue.

Applied to files:

src/models/requests.py

🪛 GitHub Actions: Black

src/utils/vector_search.py

[error] 1-1: Black --check failed: file would be reformatted by Black. Run 'uv tool run black --write src/utils/vector_search.py' (or 'uv tool run black --write .').

src/models/requests.py

[error] 1-1: Black --check failed: file would be reformatted by Black. Run 'uv tool run black --write src/models/requests.py' (or 'uv tool run black --write .').

🔇 Additional comments (4)

src/models/requests.py (2)

285-287: Solr filter description looks aligned.

This now documents structured comparison filters while preserving legacy filter-only compatibility.

777-779: Responses API Solr docs look good.

The request docstring now reflects structured metadata filters and legacy compatibility.

tests/unit/utils/test_vector_search.py (2)

70-107: Good coverage for legacy and simple structured filters.

These cases verify the key behavioral split: legacy fq remains under params["solr"], while structured metadata filters move to top-level params["filters"].

163-180: Mode plus legacy filter tests look good.

These keep coverage for request mode overrides and default mode behavior while preserving legacy Solr filter passthrough.

coderabbitai · 2026-04-23T16:39:37Z

+    def test_with_filters_and_other_solr_params(self) -> None:
+        """Test parameters with both filters and other solr-specific params."""
+        solr = SolrVectorSearchRequest.model_validate(
+            {
+                "filters": {
+                    "filters": {
+                        "type": "in",
+                        "key": "version",
+                        "value": ["4.14", "4.15"],
+                    },
+                    "custom_param": "value",
+                },
+            },
+        )
        params = _build_query_params(solr=solr)

-        assert params["solr"] == {"filter": "value"}
+        # Filters extracted to top-level
+        assert params["filters"]["type"] == "in"
+        assert params["filters"]["key"] == "version"
+        # Other params remain under solr key
+        assert params["solr"] == {"custom_param": "value"}
        assert params["k"] == constants.SOLR_VECTOR_SEARCH_DEFAULT_K

+    def test_with_compound_filter(self) -> None:
+        """Test parameters with compound AND filter."""
+        solr = SolrVectorSearchRequest.model_validate(
+            {
+                "filters": {
+                    "filters": {
+                        "type": "and",
+                        "filters": [
+                            {"type": "eq", "key": "platform", "value": "openshift"},
+                            {"type": "ne", "key": "status", "value": "archived"},
+                        ],
+                    },
+                },
+            },
+        )
+        params = _build_query_params(solr=solr)
+
+        assert params["filters"]["type"] == "and"
+        assert len(params["filters"]["filters"]) == 2
+        assert "solr" not in params


🧹 Nitpick | 🔵 Trivial

Assert the full structured filter payload.

The mixed and compound tests should verify values/nested filters are preserved, not only type, key, and list length.

Strengthen assertions

# Filters extracted to top-level assert params["filters"]["type"] == "in" assert params["filters"]["key"] == "version" + assert params["filters"]["value"] == ["4.14", "4.15"] # Other params remain under solr key assert params["solr"] == {"custom_param": "value"} @@ assert params["filters"]["type"] == "and" - assert len(params["filters"]["filters"]) == 2 + assert params["filters"]["filters"] == [ + {"type": "eq", "key": "platform", "value": "openshift"}, + {"type": "ne", "key": "status", "value": "archived"}, + ] assert "solr" not in params

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_with_filters_and_other_solr_params(self) -> None:

"""Test parameters with both filters and other solr-specific params."""

solr = SolrVectorSearchRequest.model_validate(

{

"filters": {

"filters": {

"type": "in",

"key": "version",

"value": ["4.14", "4.15"],

},

"custom_param": "value",

},

},

)

params = _build_query_params(solr=solr)

assert params["solr"] == {"filter": "value"}

# Filters extracted to top-level

assert params["filters"]["type"] == "in"

assert params["filters"]["key"] == "version"

# Other params remain under solr key

assert params["solr"] == {"custom_param": "value"}

assert params["k"] == constants.SOLR_VECTOR_SEARCH_DEFAULT_K

def test_with_compound_filter(self) -> None:

"""Test parameters with compound AND filter."""

solr = SolrVectorSearchRequest.model_validate(

{

"filters": {

"filters": {

"type": "and",

"filters": [

{"type": "eq", "key": "platform", "value": "openshift"},

{"type": "ne", "key": "status", "value": "archived"},

],

},

},

},

)

params = _build_query_params(solr=solr)

assert params["filters"]["type"] == "and"

assert len(params["filters"]["filters"]) == 2

assert "solr" not in params

def test_with_filters_and_other_solr_params(self) -> None:

"""Test parameters with both filters and other solr-specific params."""

solr = SolrVectorSearchRequest.model_validate(

{

"filters": {

"filters": {

"type": "in",

"key": "version",

"value": ["4.14", "4.15"],

},

"custom_param": "value",

},

},

)

params = _build_query_params(solr=solr)

# Filters extracted to top-level

assert params["filters"]["type"] == "in"

assert params["filters"]["key"] == "version"

assert params["filters"]["value"] == ["4.14", "4.15"]

# Other params remain under solr key

assert params["solr"] == {"custom_param": "value"}

assert params["k"] == constants.SOLR_VECTOR_SEARCH_DEFAULT_K

def test_with_compound_filter(self) -> None:

"""Test parameters with compound AND filter."""

solr = SolrVectorSearchRequest.model_validate(

{

"filters": {

"filters": {

"type": "and",

"filters": [

{"type": "eq", "key": "platform", "value": "openshift"},

{"type": "ne", "key": "status", "value": "archived"},

],

},

},

},

)

params = _build_query_params(solr=solr)

assert params["filters"]["type"] == "and"

assert params["filters"]["filters"] == [

{"type": "eq", "key": "platform", "value": "openshift"},

{"type": "ne", "key": "status", "value": "archived"},

]

assert "solr" not in params

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_vector_search.py` around lines 109 - 151, Update the two tests test_with_filters_and_other_solr_params and test_with_compound_filter to assert the full structured filter payload returned by _build_query_params (not just type/key/length); for the first test, verify params["filters"] equals the full dict {"type":"in","key":"version","value":["4.14","4.15"]} and that params["solr"] equals {"custom_param":"value"} and k is default, and for the second test assert params["filters"] equals the full nested compound dict (type "and" with its "filters" list containing the exact eq and ne dicts) and that "solr" is absent; use the SolrVectorSearchRequest-created expected structures to compare deep equality.

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

anik120 force-pushed the vector-io-filters branch from b6f480c to 3b5b567 Compare April 23, 2026 16:36

anik120 changed the title ~~Add support for metadata filters in Solr vector search~~ LCORE-1446: Add support for metadata filters in Solr vector search Apr 23, 2026

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-1446: Add support for metadata filters in Solr vector search#1581

LCORE-1446: Add support for metadata filters in Solr vector search#1581
anik120 wants to merge 1 commit intolightspeed-core:mainfrom
anik120:vector-io-filters

anik120 commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 23, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anik120 commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anik120 commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading